1. Introduction

Ten real-valued features are computed for each cell nucleus:

radius (mean of distances from center to points on the perimeter); texture (standard deviation of gray-scale values); perimeter; area; smoothness (local variation in radius lengths); compactness (perimeter^2 / area - 1.0); concavity (severity of concave portions of the contour); concave points (number of concave portions of the contour); symmetry; fractal dimension (“coastline approximation” - 1). The mean, standard error (SE) and “worst” or largest (mean of the three largest values) of these features were computed for each image, resulting in 30 features. We will analyze the features to understand the predictive value for diagnosis. We will then create models using two different algorithms and use the models to predict the diagnosis.

2. Setting Up R

2-1) Packages Used


3. Importing, Cleaning and Inspecting

3-1) Import the dataset

3-2) Remove NULL data

3-3) Reshape the datasets

3-4) Inspect the datasets

Just a quick look at the data before we move ahead with the analysis, not necessary for the final report

structure

## 'data.frame':    569 obs. of  31 variables:
##  $ diagnosis              : Factor w/ 2 levels "Benign","Malignant": 2 2 2 2 2 2 2 2 2 2 ...
##  $ radius_mean            : num  18 20.6 19.7 11.4 20.3 ...
##  $ texture_mean           : num  10.4 17.8 21.2 20.4 14.3 ...
##  $ perimeter_mean         : num  122.8 132.9 130 77.6 135.1 ...
##  $ area_mean              : num  1001 1326 1203 386 1297 ...
##  $ smoothness_mean        : num  0.1184 0.0847 0.1096 0.1425 0.1003 ...
##  $ compactness_mean       : num  0.2776 0.0786 0.1599 0.2839 0.1328 ...
##  $ concavity_mean         : num  0.3001 0.0869 0.1974 0.2414 0.198 ...
##  $ concave.points_mean    : num  0.1471 0.0702 0.1279 0.1052 0.1043 ...
##  $ symmetry_mean          : num  0.242 0.181 0.207 0.26 0.181 ...
##  $ fractal_dimension_mean : num  0.0787 0.0567 0.06 0.0974 0.0588 ...
##  $ radius_se              : num  1.095 0.543 0.746 0.496 0.757 ...
##  $ texture_se             : num  0.905 0.734 0.787 1.156 0.781 ...
##  $ perimeter_se           : num  8.59 3.4 4.58 3.44 5.44 ...
##  $ area_se                : num  153.4 74.1 94 27.2 94.4 ...
##  $ smoothness_se          : num  0.0064 0.00522 0.00615 0.00911 0.01149 ...
##  $ compactness_se         : num  0.049 0.0131 0.0401 0.0746 0.0246 ...
##  $ concavity_se           : num  0.0537 0.0186 0.0383 0.0566 0.0569 ...
##  $ concave.points_se      : num  0.0159 0.0134 0.0206 0.0187 0.0188 ...
##  $ symmetry_se            : num  0.03 0.0139 0.0225 0.0596 0.0176 ...
##  $ fractal_dimension_se   : num  0.00619 0.00353 0.00457 0.00921 0.00511 ...
##  $ radius_worst           : num  25.4 25 23.6 14.9 22.5 ...
##  $ texture_worst          : num  17.3 23.4 25.5 26.5 16.7 ...
##  $ perimeter_worst        : num  184.6 158.8 152.5 98.9 152.2 ...
##  $ area_worst             : num  2019 1956 1709 568 1575 ...
##  $ smoothness_worst       : num  0.162 0.124 0.144 0.21 0.137 ...
##  $ compactness_worst      : num  0.666 0.187 0.424 0.866 0.205 ...
##  $ concavity_worst        : num  0.712 0.242 0.45 0.687 0.4 ...
##  $ concave.points_worst   : num  0.265 0.186 0.243 0.258 0.163 ...
##  $ symmetry_worst         : num  0.46 0.275 0.361 0.664 0.236 ...
##  $ fractal_dimension_worst: num  0.1189 0.089 0.0876 0.173 0.0768 ...

summary

##      diagnosis    radius_mean      texture_mean   perimeter_mean  
##  Benign   :357   Min.   : 6.981   Min.   : 9.71   Min.   : 43.79  
##  Malignant:212   1st Qu.:11.700   1st Qu.:16.17   1st Qu.: 75.17  
##                  Median :13.370   Median :18.84   Median : 86.24  
##                  Mean   :14.127   Mean   :19.29   Mean   : 91.97  
##                  3rd Qu.:15.780   3rd Qu.:21.80   3rd Qu.:104.10  
##                  Max.   :28.110   Max.   :39.28   Max.   :188.50  
##    area_mean      smoothness_mean   compactness_mean  concavity_mean   
##  Min.   : 143.5   Min.   :0.05263   Min.   :0.01938   Min.   :0.00000  
##  1st Qu.: 420.3   1st Qu.:0.08637   1st Qu.:0.06492   1st Qu.:0.02956  
##  Median : 551.1   Median :0.09587   Median :0.09263   Median :0.06154  
##  Mean   : 654.9   Mean   :0.09636   Mean   :0.10434   Mean   :0.08880  
##  3rd Qu.: 782.7   3rd Qu.:0.10530   3rd Qu.:0.13040   3rd Qu.:0.13070  
##  Max.   :2501.0   Max.   :0.16340   Max.   :0.34540   Max.   :0.42680  
##  concave.points_mean symmetry_mean    fractal_dimension_mean
##  Min.   :0.00000     Min.   :0.1060   Min.   :0.04996       
##  1st Qu.:0.02031     1st Qu.:0.1619   1st Qu.:0.05770       
##  Median :0.03350     Median :0.1792   Median :0.06154       
##  Mean   :0.04892     Mean   :0.1812   Mean   :0.06280       
##  3rd Qu.:0.07400     3rd Qu.:0.1957   3rd Qu.:0.06612       
##  Max.   :0.20120     Max.   :0.3040   Max.   :0.09744       
##    radius_se        texture_se      perimeter_se       area_se       
##  Min.   :0.1115   Min.   :0.3602   Min.   : 0.757   Min.   :  6.802  
##  1st Qu.:0.2324   1st Qu.:0.8339   1st Qu.: 1.606   1st Qu.: 17.850  
##  Median :0.3242   Median :1.1080   Median : 2.287   Median : 24.530  
##  Mean   :0.4052   Mean   :1.2169   Mean   : 2.866   Mean   : 40.337  
##  3rd Qu.:0.4789   3rd Qu.:1.4740   3rd Qu.: 3.357   3rd Qu.: 45.190  
##  Max.   :2.8730   Max.   :4.8850   Max.   :21.980   Max.   :542.200  
##  smoothness_se      compactness_se      concavity_se    
##  Min.   :0.001713   Min.   :0.002252   Min.   :0.00000  
##  1st Qu.:0.005169   1st Qu.:0.013080   1st Qu.:0.01509  
##  Median :0.006380   Median :0.020450   Median :0.02589  
##  Mean   :0.007041   Mean   :0.025478   Mean   :0.03189  
##  3rd Qu.:0.008146   3rd Qu.:0.032450   3rd Qu.:0.04205  
##  Max.   :0.031130   Max.   :0.135400   Max.   :0.39600  
##  concave.points_se   symmetry_se       fractal_dimension_se
##  Min.   :0.000000   Min.   :0.007882   Min.   :0.0008948   
##  1st Qu.:0.007638   1st Qu.:0.015160   1st Qu.:0.0022480   
##  Median :0.010930   Median :0.018730   Median :0.0031870   
##  Mean   :0.011796   Mean   :0.020542   Mean   :0.0037949   
##  3rd Qu.:0.014710   3rd Qu.:0.023480   3rd Qu.:0.0045580   
##  Max.   :0.052790   Max.   :0.078950   Max.   :0.0298400   
##   radius_worst   texture_worst   perimeter_worst    area_worst    
##  Min.   : 7.93   Min.   :12.02   Min.   : 50.41   Min.   : 185.2  
##  1st Qu.:13.01   1st Qu.:21.08   1st Qu.: 84.11   1st Qu.: 515.3  
##  Median :14.97   Median :25.41   Median : 97.66   Median : 686.5  
##  Mean   :16.27   Mean   :25.68   Mean   :107.26   Mean   : 880.6  
##  3rd Qu.:18.79   3rd Qu.:29.72   3rd Qu.:125.40   3rd Qu.:1084.0  
##  Max.   :36.04   Max.   :49.54   Max.   :251.20   Max.   :4254.0  
##  smoothness_worst  compactness_worst concavity_worst  concave.points_worst
##  Min.   :0.07117   Min.   :0.02729   Min.   :0.0000   Min.   :0.00000     
##  1st Qu.:0.11660   1st Qu.:0.14720   1st Qu.:0.1145   1st Qu.:0.06493     
##  Median :0.13130   Median :0.21190   Median :0.2267   Median :0.09993     
##  Mean   :0.13237   Mean   :0.25427   Mean   :0.2722   Mean   :0.11461     
##  3rd Qu.:0.14600   3rd Qu.:0.33910   3rd Qu.:0.3829   3rd Qu.:0.16140     
##  Max.   :0.22260   Max.   :1.05800   Max.   :1.2520   Max.   :0.29100     
##  symmetry_worst   fractal_dimension_worst
##  Min.   :0.1565   Min.   :0.05504        
##  1st Qu.:0.2504   1st Qu.:0.07146        
##  Median :0.2822   Median :0.08004        
##  Mean   :0.2901   Mean   :0.08395        
##  3rd Qu.:0.3179   3rd Qu.:0.09208        
##  Max.   :0.6638   Max.   :0.20750

4. Correlations and Covariances

4-1) Correlations between all the variables

Mean

Standard Errors

Worst (Mean of Largest 3)

4-2)Correlations between variables with diagnoses

Mean

Standard Errors

Worst (Mean of Largest 3)

5. Supervised Machine Learning Methods

5-1)

5-2) Check proportion of diagnoses within the training_seting set

training_seting set

## 
##    Benign Malignant 
## 0.6256281 0.3743719

testing_seting set

## 
##    Benign Malignant 
## 0.6315789 0.3684211

5-3) Apply Machine Leaning Methods

SVM

## Confusion Matrix and Statistics
## 
##            Reference
## Prediction  Benign Malignant
##   Benign       106         1
##   Malignant      2        62
##                                           
##                Accuracy : 0.9825          
##                  95% CI : (0.9496, 0.9964)
##     No Information Rate : 0.6316          
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.9624          
##                                           
##  Mcnemar's Test P-Value : 1               
##                                           
##             Sensitivity : 0.9815          
##             Specificity : 0.9841          
##          Pos Pred Value : 0.9907          
##          Neg Pred Value : 0.9688          
##              Prevalence : 0.6316          
##          Detection Rate : 0.6199          
##    Detection Prevalence : 0.6257          
##       Balanced Accuracy : 0.9828          
##                                           
##        'Positive' Class : Benign          
## 

K Nearest Neighnours

https://www.analyticsvidhya.com/blog/2018/03/introduction-k-neighbours-algorithm-clustering/

## Confusion Matrix and Statistics
## 
##            Reference
## Prediction  Benign Malignant
##   Benign       104         5
##   Malignant      4        58
##                                           
##                Accuracy : 0.9474          
##                  95% CI : (0.9024, 0.9757)
##     No Information Rate : 0.6316          
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.8865          
##                                           
##  Mcnemar's Test P-Value : 1               
##                                           
##             Sensitivity : 0.9630          
##             Specificity : 0.9206          
##          Pos Pred Value : 0.9541          
##          Neg Pred Value : 0.9355          
##              Prevalence : 0.6316          
##          Detection Rate : 0.6082          
##    Detection Prevalence : 0.6374          
##       Balanced Accuracy : 0.9418          
##                                           
##        'Positive' Class : Benign          
## 

Random Forest

find an article explaining how random forest works

## Confusion Matrix and Statistics
## 
##            Reference
## Prediction  Benign Malignant
##   Benign       102         3
##   Malignant      6        60
##                                           
##                Accuracy : 0.9474          
##                  95% CI : (0.9024, 0.9757)
##     No Information Rate : 0.6316          
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.888           
##                                           
##  Mcnemar's Test P-Value : 0.505           
##                                           
##             Sensitivity : 0.9444          
##             Specificity : 0.9524          
##          Pos Pred Value : 0.9714          
##          Neg Pred Value : 0.9091          
##              Prevalence : 0.6316          
##          Detection Rate : 0.5965          
##    Detection Prevalence : 0.6140          
##       Balanced Accuracy : 0.9484          
##                                           
##        'Positive' Class : Benign          
## 

naiveBayes

laplace function applies smoothing. If a given class and feature value never occur together in training_seting daya frequency-based probability estimator will be zero –> wipes out all information about other probabilities when multiplied –> small-sample correction (pseudocount) so no probability set to zero (called laplace when pseudocount is 1), Lidstone smoothing in general case

## Confusion Matrix and Statistics
## 
##            Reference
## Prediction  Benign Malignant
##   Benign       101         6
##   Malignant      7        57
##                                           
##                Accuracy : 0.924           
##                  95% CI : (0.8735, 0.9589)
##     No Information Rate : 0.6316          
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.8372          
##                                           
##  Mcnemar's Test P-Value : 1               
##                                           
##             Sensitivity : 0.9352          
##             Specificity : 0.9048          
##          Pos Pred Value : 0.9439          
##          Neg Pred Value : 0.8906          
##              Prevalence : 0.6316          
##          Detection Rate : 0.5906          
##    Detection Prevalence : 0.6257          
##       Balanced Accuracy : 0.9200          
##                                           
##        'Positive' Class : Benign          
## 

Decision Trees (using c5.0 algorithm)

## Confusion Matrix and Statistics
## 
##            Reference
## Prediction  Benign Malignant
##   Benign       103         4
##   Malignant      5        59
##                                           
##                Accuracy : 0.9474          
##                  95% CI : (0.9024, 0.9757)
##     No Information Rate : 0.6316          
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.8873          
##                                           
##  Mcnemar's Test P-Value : 1               
##                                           
##             Sensitivity : 0.9537          
##             Specificity : 0.9365          
##          Pos Pred Value : 0.9626          
##          Neg Pred Value : 0.9219          
##              Prevalence : 0.6316          
##          Detection Rate : 0.6023          
##    Detection Prevalence : 0.6257          
##       Balanced Accuracy : 0.9451          
##                                           
##        'Positive' Class : Benign          
## 

Decision Trees using recursive partitioning

## Confusion Matrix and Statistics
## 
##            Reference
## Prediction  Benign Malignant
##   Benign        97         7
##   Malignant     11        56
##                                           
##                Accuracy : 0.8947          
##                  95% CI : (0.8387, 0.9364)
##     No Information Rate : 0.6316          
##     P-Value [Acc > NIR] : 5.497e-15       
##                                           
##                   Kappa : 0.7768          
##                                           
##  Mcnemar's Test P-Value : 0.4795          
##                                           
##             Sensitivity : 0.8981          
##             Specificity : 0.8889          
##          Pos Pred Value : 0.9327          
##          Neg Pred Value : 0.8358          
##              Prevalence : 0.6316          
##          Detection Rate : 0.5673          
##    Detection Prevalence : 0.6082          
##       Balanced Accuracy : 0.8935          
##                                           
##        'Positive' Class : Benign          
## 

Decision Trees using pruning method

## Confusion Matrix and Statistics
## 
##            Reference
## Prediction  Benign Malignant
##   Benign        96         5
##   Malignant     12        58
##                                          
##                Accuracy : 0.9006         
##                  95% CI : (0.8456, 0.941)
##     No Information Rate : 0.6316         
##     P-Value [Acc > NIR] : 1.085e-15      
##                                          
##                   Kappa : 0.7912         
##                                          
##  Mcnemar's Test P-Value : 0.1456         
##                                          
##             Sensitivity : 0.8889         
##             Specificity : 0.9206         
##          Pos Pred Value : 0.9505         
##          Neg Pred Value : 0.8286         
##              Prevalence : 0.6316         
##          Detection Rate : 0.5614         
##    Detection Prevalence : 0.5906         
##       Balanced Accuracy : 0.9048         
##                                          
##        'Positive' Class : Benign         
## 

5-4) Visualisation to compare accuracies

### 5-5) Select best prediction model

6. Unsupervised Machine Learning Methods –> Principal Component Analysis (PCA)

Collapse variables into fewer dimenisions that explain the majority of the variance

Multicollinearity shown in the correlation matrices

Uses standardized data so avaoids distortion by scale differences

6-1) Summary

In the results of PCA, if the cumulative proportion is 85% or above, it can be determined by the number of principal components.

All

## Importance of components:
##                           PC1    PC2     PC3     PC4     PC5     PC6
## Standard deviation     3.6444 2.3857 1.67867 1.40735 1.28403 1.09880
## Proportion of Variance 0.4427 0.1897 0.09393 0.06602 0.05496 0.04025
## Cumulative Proportion  0.4427 0.6324 0.72636 0.79239 0.84734 0.88759
##                            PC7     PC8    PC9    PC10   PC11    PC12
## Standard deviation     0.82172 0.69037 0.6457 0.59219 0.5421 0.51104
## Proportion of Variance 0.02251 0.01589 0.0139 0.01169 0.0098 0.00871
## Cumulative Proportion  0.91010 0.92598 0.9399 0.95157 0.9614 0.97007
##                           PC13    PC14    PC15    PC16    PC17    PC18
## Standard deviation     0.49128 0.39624 0.30681 0.28260 0.24372 0.22939
## Proportion of Variance 0.00805 0.00523 0.00314 0.00266 0.00198 0.00175
## Cumulative Proportion  0.97812 0.98335 0.98649 0.98915 0.99113 0.99288
##                           PC19    PC20   PC21    PC22    PC23   PC24
## Standard deviation     0.22244 0.17652 0.1731 0.16565 0.15602 0.1344
## Proportion of Variance 0.00165 0.00104 0.0010 0.00091 0.00081 0.0006
## Cumulative Proportion  0.99453 0.99557 0.9966 0.99749 0.99830 0.9989
##                           PC25    PC26    PC27    PC28    PC29    PC30
## Standard deviation     0.12442 0.09043 0.08307 0.03987 0.02736 0.01153
## Proportion of Variance 0.00052 0.00027 0.00023 0.00005 0.00002 0.00000
## Cumulative Proportion  0.99942 0.99969 0.99992 0.99997 1.00000 1.00000

Mean

The cumulative proportion from PC1 to PC3 is about 88.7%. (above 85%)

## Importance of components:
##                           PC1    PC2     PC3    PC4     PC5     PC6
## Standard deviation     2.3406 1.5870 0.93841 0.7064 0.61036 0.35234
## Proportion of Variance 0.5479 0.2519 0.08806 0.0499 0.03725 0.01241
## Cumulative Proportion  0.5479 0.7997 0.88779 0.9377 0.97495 0.98736
##                            PC7     PC8     PC9    PC10
## Standard deviation     0.28299 0.18679 0.10552 0.01680
## Proportion of Variance 0.00801 0.00349 0.00111 0.00003
## Cumulative Proportion  0.99537 0.99886 0.99997 1.00000

SE

The cumulative proportion from PC1 to PC4 is about 86.7%. (above 85%)

## Importance of components:
##                           PC1    PC2    PC3     PC4     PC5     PC6
## Standard deviation     2.1779 1.4406 1.1245 0.77095 0.75991 0.57939
## Proportion of Variance 0.4743 0.2075 0.1264 0.05944 0.05775 0.03357
## Cumulative Proportion  0.4743 0.6819 0.8083 0.86774 0.92548 0.95905
##                            PC7    PC8     PC9    PC10
## Standard deviation     0.43512 0.3962 0.20436 0.14635
## Proportion of Variance 0.01893 0.0157 0.00418 0.00214
## Cumulative Proportion  0.97798 0.9937 0.99786 1.00000

Worst

The cumulative proportion from PC1 to PC3 is about 85.8%. (above 85%)

## Importance of components:
##                           PC1    PC2     PC3     PC4     PC5     PC6
## Standard deviation     2.3869 1.4443 0.89597 0.73531 0.71741 0.42862
## Proportion of Variance 0.5697 0.2086 0.08028 0.05407 0.05147 0.01837
## Cumulative Proportion  0.5697 0.7783 0.85860 0.91267 0.96413 0.98251
##                            PC7     PC8     PC9    PC10
## Standard deviation     0.28959 0.26802 0.12343 0.06326
## Proportion of Variance 0.00839 0.00718 0.00152 0.00040
## Cumulative Proportion  0.99089 0.99808 0.99960 1.00000

6-2) Screeplot

Diagram which shows the percentage of variability explained by the principal components

The percentage of variability explained by the principal components can be ascertained through screeplot.

=> View Point : principal components where the line lies.

All

Line lies at point PC6

Mean

Line lies at point PC4

SE

Line lies at point PC4

Worst

Line lies at point PC4

6-3) Get the PCA

## Principal Component Analysis Results for variables
##  ===================================================
##   Name       Description                                    
## 1 "$coord"   "Coordinates for the variables"                
## 2 "$cor"     "Correlations between variables and dimensions"
## 3 "$cos2"    "Cos2 for the variables"                       
## 4 "$contrib" "contributions of the variables"
##      Dim.1             Dim.2               Dim.3          
##  Min.   :0.02112   Min.   : 0.006818   Min.   : 0.000747  
##  1st Qu.:1.54221   1st Qu.: 0.998304   1st Qu.: 0.166172  
##  Median :3.73990   Median : 3.174444   Median : 2.648693  
##  Mean   :3.33333   Mean   : 3.333333   Mean   : 3.333333  
##  3rd Qu.:5.14716   3rd Qu.: 4.766218   3rd Qu.: 5.534672  
##  Max.   :6.80447   Max.   :13.437758   Max.   :14.035038  
##      Dim.4              Dim.5               Dim.6          
##  Min.   : 0.00017   Min.   : 0.001942   Min.   : 0.000008  
##  1st Qu.: 0.06915   1st Qu.: 0.188920   1st Qu.: 0.042172  
##  Median : 0.21778   Median : 1.473744   Median : 0.143640  
##  Mean   : 3.33333   Mean   : 3.333333   Mean   : 3.333333  
##  3rd Qu.: 0.74240   3rd Qu.: 5.832032   3rd Qu.: 0.602393  
##  Max.   :40.04458   Max.   :13.328963   Max.   :24.892794  
##      Dim.7              Dim.8              Dim.9          
##  Min.   : 0.00002   Min.   : 0.00555   Min.   : 0.004128  
##  1st Qu.: 0.29158   1st Qu.: 0.13098   1st Qu.: 1.214586  
##  Median : 1.24746   Median : 0.66814   Median : 2.063070  
##  Mean   : 3.33333   Mean   : 3.33333   Mean   : 3.333333  
##  3rd Qu.: 4.18635   3rd Qu.: 2.93295   3rd Qu.: 5.121402  
##  Max.   :14.03683   Max.   :32.87993   Max.   :12.824068  
##      Dim.10             Dim.11             Dim.12         
##  Min.   : 0.00649   Min.   : 0.01731   Min.   : 0.004172  
##  1st Qu.: 0.31137   1st Qu.: 0.57090   1st Qu.: 0.184512  
##  Median : 0.74500   Median : 1.97397   Median : 0.550435  
##  Mean   : 3.33333   Mean   : 3.33333   Mean   : 3.333333  
##  3rd Qu.: 2.44026   3rd Qu.: 3.80525   3rd Qu.: 6.830005  
##  Max.   :32.72635   Max.   :12.21628   Max.   :13.813302  
##      Dim.13             Dim.14              Dim.15         
##  Min.   : 0.01008   Min.   : 0.002868   Min.   : 0.006836  
##  1st Qu.: 0.34923   1st Qu.: 0.048205   1st Qu.: 0.252498  
##  Median : 1.82371   Median : 0.400154   Median : 1.340680  
##  Mean   : 3.33333   Mean   : 3.333333   Mean   : 3.333333  
##  3rd Qu.: 4.19080   3rd Qu.: 3.886428   3rd Qu.: 3.981817  
##  Max.   :24.40624   Max.   :24.182390   Max.   :26.829883  
##      Dim.16             Dim.17              Dim.18         
##  Min.   : 0.07031   Min.   : 0.000255   Min.   : 0.000004  
##  1st Qu.: 0.70436   1st Qu.: 0.208020   1st Qu.: 0.244339  
##  Median : 2.42537   Median : 1.945913   Median : 1.044597  
##  Mean   : 3.33333   Mean   : 3.333333   Mean   : 3.333333  
##  3rd Qu.: 3.86531   3rd Qu.: 5.141920   3rd Qu.: 4.391202  
##  Max.   :16.44761   Max.   :18.546641   Max.   :25.155408  
##      Dim.19              Dim.20              Dim.21        
##  Min.   : 0.000514   Min.   : 0.003166   Min.   : 0.00008  
##  1st Qu.: 0.098970   1st Qu.: 0.223798   1st Qu.: 0.41567  
##  Median : 0.808752   Median : 0.818531   Median : 0.84979  
##  Mean   : 3.333333   Mean   : 3.333333   Mean   : 3.33333  
##  3rd Qu.: 5.394712   3rd Qu.: 4.493252   3rd Qu.: 2.70000  
##  Max.   :17.353249   Max.   :23.881433   Max.   :35.33591  
##      Dim.22             Dim.23             Dim.24         
##  Min.   : 0.02966   Min.   : 0.00003   Min.   : 0.001041  
##  1st Qu.: 0.42590   1st Qu.: 0.16634   1st Qu.: 0.157351  
##  Median : 0.93373   Median : 0.55925   Median : 1.118307  
##  Mean   : 3.33333   Mean   : 3.33333   Mean   : 3.333333  
##  3rd Qu.: 3.20759   3rd Qu.: 2.00915   3rd Qu.: 2.363536  
##  Max.   :22.04032   Max.   :32.20878   Max.   :31.214744  
##      Dim.25             Dim.26              Dim.27         
##  Min.   : 0.02541   Min.   : 0.002043   Min.   : 0.008194  
##  1st Qu.: 0.16126   1st Qu.: 0.065086   1st Qu.: 0.047366  
##  Median : 0.82932   Median : 0.892586   Median : 0.603241  
##  Mean   : 3.33333   Mean   : 3.333333   Mean   : 3.333333  
##  3rd Qu.: 3.04470   3rd Qu.: 3.515281   3rd Qu.: 4.200174  
##  Max.   :38.98566   Max.   :30.126505   Max.   :21.772720  
##      Dim.28             Dim.29             Dim.30        
##  Min.   : 0.00000   Min.   : 0.00000   Min.   : 0.00001  
##  1st Qu.: 0.00322   1st Qu.: 0.00536   1st Qu.: 0.00013  
##  Median : 0.03508   Median : 0.01462   Median : 0.00131  
##  Mean   : 3.33333   Mean   : 3.33333   Mean   : 3.33333  
##  3rd Qu.: 0.59274   3rd Qu.: 0.15723   3rd Qu.: 0.05211  
##  Max.   :53.09759   Max.   :40.41462   Max.   :49.33856
## Principal Component Analysis Results for variables
##  ===================================================
##   Name       Description                                    
## 1 "$coord"   "Coordinates for the variables"                
## 2 "$cor"     "Correlations between variables and dimensions"
## 3 "$cos2"    "Cos2 for the variables"                       
## 4 "$contrib" "contributions of the variables"

5-4) Contributions of variables to PCA

Quality of representation of PCA

Contributions of variables to PCA

5-5) Biplot of PCAs with diagnosis

7. Factor Analysis

Importing the data

## Parallel analysis suggests that the number of factors =  2  and the number of components =  NA
## Factor Analysis using method =  ml
## Call: fa(r = wbcd_sem[, c(2:11)], nfactors = 3, rotate = "oblimin", 
##     fm = "ml")
## 
##  Warning: A Heywood case was detected. 
## Standardized loadings (pattern matrix) based upon correlation matrix
##                          ML1   ML2   ML3   h2     u2 com
## radius_mean             1.00  0.01 -0.02 1.00 0.0028 1.0
## texture_mean            0.33  0.15 -0.09 0.12 0.8841 1.5
## perimeter_mean          0.99  0.06 -0.01 1.00 0.0024 1.0
## area_mean               0.98 -0.04  0.04 0.98 0.0210 1.0
## smoothness_mean        -0.08  0.21  0.61 0.57 0.4259 1.3
## compactness_mean        0.35  0.80  0.09 1.00 0.0050 1.4
## concavity_mean          0.48  0.32  0.40 0.90 0.1042 2.7
## concave.points_mean     0.62  0.07  0.53 1.00 0.0050 2.0
## symmetry_mean          -0.03  0.36  0.34 0.42 0.5758 2.0
## fractal_dimension_mean -0.50  0.69  0.21 0.81 0.1928 2.0
## 
##                        ML1  ML2  ML3
## SS loadings           4.22 1.95 1.61
## Proportion Var        0.42 0.19 0.16
## Cumulative Var        0.42 0.62 0.78
## Proportion Explained  0.54 0.25 0.21
## Cumulative Proportion 0.54 0.79 1.00
## 
##  With factor correlations of 
##      ML1  ML2  ML3
## ML1 1.00 0.16 0.36
## ML2 0.16 1.00 0.78
## ML3 0.36 0.78 1.00
## 
## Mean item complexity =  1.6
## Test of the hypothesis that 3 factors are sufficient.
## 
## The degrees of freedom for the null model are  45  and the objective function was  19.82 with Chi Square of  11176.18
## The degrees of freedom for the model are 18  and the objective function was  1.81 
## 
## The root mean square of the residuals (RMSR) is  0.03 
## The df corrected root mean square of the residuals is  0.04 
## 
## The harmonic number of observations is  569 with the empirical chi square  36.43  with prob <  0.0062 
## The total number of observations was  569  with Likelihood Chi Square =  1017.14  with prob <  1.5e-204 
## 
## Tucker Lewis Index of factoring reliability =  0.775
## RMSEA index =  0.314  and the 90 % confidence intervals are  0.296 0.329
## BIC =  902.95
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy             
##                                                   ML1  ML2  ML3
## Correlation of (regression) scores with factors     1 1.00 0.99
## Multiple R square of scores with factors            1 0.99 0.98
## Minimum correlation of possible factor scores       1 0.98 0.96
## 
## Loadings:
##                        ML1    ML2    ML3   
## radius_mean             1.004              
## texture_mean            0.333              
## perimeter_mean          0.990              
## area_mean               0.980              
## smoothness_mean                       0.611
## compactness_mean        0.346  0.805       
## concavity_mean          0.485  0.324  0.401
## concave.points_mean     0.621         0.535
## symmetry_mean                  0.364  0.336
## fractal_dimension_mean -0.498  0.690       
## 
##                  ML1   ML2   ML3
## SS loadings    4.055 1.436 0.996
## Proportion Var 0.406 0.144 0.100
## Cumulative Var 0.406 0.549 0.649